Conversation
✅ Changeset File DetectedThe following changeset entries were found:
Change description: |
|
I'm wondering if there is a way to do this automatically, so we don't even expose any APIs for it, it's just automatic |
| return self._data | ||
|
|
||
| @utils.log_exceptions(logger=logger) | ||
| async def _read_audio_task(self, reader: rtc.ByteStreamReader, participant_id: str): |
There was a problem hiding this comment.
Let's make sure we're reading from the right participant_id? The one that is registered inside the RoomIO
There was a problem hiding this comment.
we don't know which participant listening to at this moment, so maybe we buffer all the pre-connect audios for participants and decide if we want to use them based on the participant id and the received timestamp.
Imagine in a multi-user room, the second user joined the room with the pre-connect audio but the agent is not listening to that user until the set_participant is called, then that pre-connect buffer was actually out of date we should ignore.
There was a problem hiding this comment.
fixed for multiple participants
| async def entrypoint(ctx: JobContext): | ||
| # register a pre-connect audio handler before connecting to the room so that | ||
| # it won't miss the audio buffer (TODO (long): does this makes sense?) | ||
| pre_connect_audio = PreConnectAudioHandler(ctx.room).register() |
There was a problem hiding this comment.
@theomonnom we can do that automatically if this line can run after the ctx.connect(), basically register the byte stream handler inside the room io after the ctx.connect(). I am not sure if we are going to miss the stream reader if the ctx connect will auto subscribe the audio track.
There was a problem hiding this comment.
@bcherry to enable it automatically, is that okay to register the handler after ctx.connect, what will happen if the handler is registered after the byte stream sent?
There was a problem hiding this comment.
My concern is that if we move the registering to somewhere like room io, user may do anything between the ctx.connect and session.start(), the handler may not be there when the audio is sent, also there might be a gap between the audio track subscribed (done in ctx.connect) and audio stream created from the track (in room_io.start()).
(If the pre-connect buffer is sent until the audio track got subscribed.)
|
Works fine for me (in a single participant scenario) 👍 |
|
yeah i think this needs to be automatic from the agent side so the developer only has to "turn it on" in one spot (which would be on the client). right now the client sets a participant attribute indicating this feature is enabled but I think we're going to move that to be a track feature instead. ideally the agents framework can read that and automatically do the right thing |
|
Token detection seems to work fine 👍 I'm leaving the final decision here to @bcherry (more moving parts) |
lukasIO
left a comment
There was a problem hiding this comment.
Please hold this for a moment, until we can get protocol level support for the pre connect buffer as part of the audio track features
| from ..agent import logger, utils | ||
|
|
||
| PRE_CONNECT_AUDIO_BUFFER_STREAM = "lk.agent.pre-connect-audio-buffer" | ||
| PRE_CONNECT_AUDIO_ATTRIBUTE = "lk.agent.pre-connect-audio" |
There was a problem hiding this comment.
we discussed this in the client team and the most reliable solution would be to use the audioTrackFeature enum on the publication itself to figure out if the preconnect buffer should be handled for that track
There was a problem hiding this comment.
@longcw are you fine with this alternative?
There was a problem hiding this comment.
I think it's okay, how to read that?
There was a problem hiding this comment.
class TrackInfo(_message.Message):
...
audio_features: _containers.RepeatedScalarFieldContainer[AudioTrackFeature]but the exact case hasn't been added yet, am I right @lukasIO?
There was a problem hiding this comment.
what is audio_features? pls let me know how to access it from python sdk when it's ready
There was a problem hiding this comment.
it's part of the track info object, might be that the rust layer is not yet exposing this
There was a problem hiding this comment.
I think it is mapped, the snippet above is copied from models.pyi
| self._timeout = timeout | ||
| self._max_delta_s = max_delta_s | ||
|
|
||
| self._buffers: dict[str, asyncio.Future[_PreConnectAudioBuffer]] = {} |
There was a problem hiding this comment.
instead of a dictionary with participant identity keys, it would make more sense to use the track id. We don't know the publication id yet on the client side, but the mediastreamtrack id should work, I think
There was a problem hiding this comment.
I don't think it's needed to distinguish the tracks from the same participant, we always read the first audio track from the participant.
There was a problem hiding this comment.
The track id in transcription makes it's hard to use IMO, I don't know if it's worth to add it here
There was a problem hiding this comment.
in what way does it make it harder to use?
There was a problem hiding this comment.
just so that this doesn't get lost after switching PRs, what's your thinking here @longcw ?
There was a problem hiding this comment.
I don't against it, but let's see what is the new protocol... for example if the buffer is bound with a track id, then it's fine to use track id as the key
There was a problem hiding this comment.
protocol PR was merged btw livekit/protocol#1057
There was a problem hiding this comment.
it's not added to the python sdk right? and can we update this example to use the track feature flag as well so I can test it livekit-examples/agent-starter-swift#16
|
separate the room io updates to #2167 |
No description provided.